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Abstract 

Measuring independence between two or more random variables is a fundamental problem that touches 
many areas of computer science. The problems of efficiently testing pairwise, or fc-wise, independence 
were recently considered by Alon, Andoni, Kaufman, Matulef, Rubinfeld and Xie (STOC 07); Alon, 
Goldreich and Mansour (IPL 03); Batu, Fortnow, Fischer, Kumar, Rubinfeld and White (FOCS 01); 
and Batu, Kumar and Rubinfeld (STOC 04). They addressed the problem of minimizing the number of 
samples needed to obtain sufficient approximation, when the joint distribution is accessible through a 
sampling procedure. 

A data stream model represents another setting where approximating pairwise, or fc-wise, independence 
with sublinear memory is of considerable importance. Unlike the work in the aforementioned papers, 
in the streaming model the joint distribution is given by a stream of fc-tuples, with the goal of testing 
correlations among the components measured over the entire stream. In the streaming model, Indyk and 
McGregor (SODA 08) recently gave exciting new results for measuring pairwise independence. 

Statistical distance is one of the most fundamental metrics for measuring the similarity of two distri- 
butions, and it has been a metric of choice in many papers that discuss distribution closeness (see, for 
example, Rubinfeld and Servedio (STOC 05); Sahai and Vadhan (JACM 03); and the above papers). The 
Indyk and McGregor methods provide log n-approximation under statistical distance between the joint 
and product distributions in the streaming model. (In contrast, for the L2 metric, Indyk and McGregor 
give an (1 ± e) -approximation for the same problem, but for probability distributions, statistical distance 
is a significantly more powerful metric then the L2 metric). For the L\ metric, in addition to logn ap- 
proximation, Indyk and McGregor give an e-approximation that requires linear memory, and also give a 
method that requires two passes to solve a promise problem for a restricted range of parameters. Indyk 
and McGregor leave, as their main open question, the problem of improving their log n-approximation 
for the statistical distance metric. 



In this paper we solve the main open problem posed by of Indyk and McGregor for the statistical dis- 
tance for pairwise independence and extend this result to any constant fc. In particular, we present an 
algorithm that computes an (e, ^-approximation of the statistical distance between the joint and product 

distributions defined by a stream of fc-tuples. Our algorithm requires 0((- log(^)) ' 30+fe ) ) memory 
and a single pass over the data stream. 



1 Introduction 



Finding correlations between columns of a table is a fundamental problem in databases. Virtually all com- 
mercial databases construct query plans for queries that employ cross-dimensional predicates. The basic 
step is estimating "selectivity" (i.e., the number of rows that satisfy the predicate conditions) of the com- 
plex predicate. Without any prior knowledge, the typical solution is to compute selectivity of each column 
separately and use the multiplication as an estimate. Thus, optimizers make a "statistical independence 
assumption" which sometimes may not hold. Incorrect estimations may lead to suboptimal query plans and 
decrease performance significantly. Identifying correlations between database columns by measuring a level 
of independence between columns has a long history in the database research community. To illustrate this 
point, we cite as an example, Poosala and Ioannidis [39]: 

"For a query involving two or more attributes of the same relation, its result size depends on the 
joint data distribution of those attributes; i.e., the frequencies of all combinations of attribute 
values in the database. Due to the multi-dimensional nature of these distributions and the large 
number of such attribute value combinations, direct approximation of joint distributions can be 
rather complex and expensive. In practice, most commercial DBMSs adopt the attribute value 
independence assumption. Under this assumption, the data distributions of individual attributes 
in a relation are independent of each other and the joint data distribution can be derived from 
the individual distributions (which are approximated by one -dimensional histograms). Unfortu- 
nately, real-life data rarely satisfies the attribute value independence assumption. For instance, 
functional dependencies represent the exact opposite of the assumption. Moreover, there are 
intermediate situations as well. For example, it is natural for the salary attribute of the Em- 
ployee relation to be strongly dependent on the age attribute (i.e., higher/lower salaries mostly 
going to older/younger people). Making the attribute value independence assumption in these 
cases may result in very inaccurate approximations of joint data distributions and therefore 
inaccurate query result size estimations with devastating effects on a DBMS' s performance." 
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For data warehouses, it is important to find correlated columns for correct schema construction, as 
Kimball and Caserta note in 0411 : 

"Perfectly correlated attributes, such as the levels of a hierarchy, as well as attributes with a 
reasonable statistical correlation, should be part of the same dimension." 

In practice, typical solutions for finding correlations between columns are either histograms (see e.g., 
[39]) or sampling (see e.g., [30]). These methods have their natural disadvantages, i.e., they do not tolerate 
deletions and may require several passes over the data. When it comes to very large data volumes, it is 
critical to maintain sublinear in terms of memory solutions that do not require additional passes over the 
data and can tolerate incremental updates of the data, e.g., deletions. 

For these purposes, a theoretical data stream model can be useful. For data warehouses, the "loading" 
phase of the ETL process (see e.g., Kimball and Caserta [34]) can be seen as a data stream. When reading 
a database table, the process can be considered as a stream of data tuples. Thus, the data stream model 
represents another setting where approximating pairwise or fe-wise independence with sublinear memory is 
of considerable importance. 

1.1 Precise Definition of the Problem 

The natural way to model database tables in a streaming model is by considering a stream of tuples. In this 
paper we consider a stream of /c-tuples {i\,...,ik) where i\ G [n]. (For simplicity, we assume that elements 
of all columns are drawn from the same domain, even though our approach trivially extends to a general case 
of different domains.) As pointed out in |39l[30l[32l . the natural way to define a joint distribution of two 
(or more) columns is given by the frequencies of all combinations of coordinates. Similarly, the distribution 
of each column is defined by the corresponding set of frequencies; the definition of a product distribution 
follows. Let us define these notions precisely!] 

Definition 1.1. Let D be a stream of elements pi, ■ ■ ■ ,p m , where each stream element is a k-tuple i = 

(ii, . . . , ifc), where ii € [n]. A frequency of a tuple is [n] k is defined as the number of times it appears in 

'Here and thenceforth, we use lowercase Latin characters for indexes. We use an italic font for integers and a boldface font for 
multidimensional indexes, e.g., i £ [n] and i £ [n] fe . For a multidimensional index, we use subscript to indicate its coordinate, e.g., 
ii indicates the first coordinate of i. 
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D: fi = \{j : pj = i}|. For I G [k], a l-th margin frequency of t G [re] w the number of times t appears 
as a l-th coordinate: fi(t) = z~2le[n] k i,=t /i- ^ j°i nt distribution jj defined by a vector of probabilities 
Pjoint{i) = ™ , i £ [^] fc - m ij f/ie of stream D. A l-th margin distribution is defined by a vector of 
probabilities Pi{t) = t E [re] . A product distribution is defined as: Pproducti*) = Il?=i-^(^)^ G M fe - 

Statistical distance is one of the most fundamental metrics for measuring the similarity of two distri- 
butions, and it has been a metric of choice in many papers that discuss distribution closeness (see e.g., 
(2lffl[ini[l2l[32l|41][40)). Given two distributions over a discrete domain, the statistical distance is half of 
L\ distance between the probability vectors. 

Definition 1.2. Consider two distributions over a finite domain 0, given by two random variables V, U. 
Statistical distance A(V,U) is defined as: 

A(V,U) = - J2\P(V = x) -P(U = x)\ =max|P(T/ G B) — P(U G B)\. 

In particular, one of the most common methods of measuring independence is computing statistical 
distance between product and joint distributions (see e.g., H0l[32l). This is precisely the way we define our 
problem: 

Definition 1.3. An Independence Problem is the following: Given stream D of k-tuples, approximate, with 
one pass over D, with small memory and high precision the statistical distance between joint and product 
distribution A(P joint , P pr oduct)- 

In the streaming model, Indyk and McGregor ll32l recently gave exciting new results for measuring 
pairwise independence, i.e., for k = 2. To measure the independence, they consider two metrics: L2 and 
L\. Recall that the L2 distance between two probability distributions is a L2 distance of their probability 
vectors. In particular, the independence problem under the L2 metric is defined as \\P joint — Pproduct\\2- 

For the L2 metric and k = 2, Indyk and McGregor give an (1 ± e)-approximation using polylogarithmic 
space. However, it is well known that for probability distributions, statistical distance is a significantly 
more powerful metric then the L2 metric. For instance, consider two distributions on [2re], where the first 
distribution is uniform on {1, . . . , re} and the second is uniform on {n+1, . . . , 2ra}. In this case the statistical 
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distance is 1 but the L2 distance is y/2/n — ► 0. For example, Batu, Fortnow, Rubinfeld, Smith and White 
ITU say: 

"However, the L2-distance does not in general give a good measure of the closeness of two 
distributions. For example, two distributions can have disjoint support and still have small L2- 
distance." 

For the statistical distance metric and k = 2, the Indyk and McGregor methods provide logn- 
approximation with polylogarithmic memory. In addition to log n-approximation, Indyk and McGregor 
give an (1 ± e)-approximation that requires Q(n) memory, and also give a method that requires two passes 
to solve a promise problem for a restricted range of parameters. Indyk and McGregor leave, as their main 
open question, the problem of improving their log n-approximation for the statistical distance metric. 

In this paper we solve the main open problem posed by of Indyk and McGregor for the statistical distance 
for pairwise independence and extend this result to any constant k. In particular, we present an algorithm 
that computes an (e, (^-approximation of the statistical distance between the joint and product distributions 
defined by a stream of fc-tuples. Our algorithm requires 0(Q log(^)) ^ 30+k ^ ) memory and a single pass 
over the data stream. Theorem 12.51 formally describes our main result. We did not try to optimize the 
constants in our memory bounds. 

1.2 Implicit Tensors 

It is convenient to present an alternative, equivalent formulation of the independence problem as well. We 
can consider the problem of approximating the sum of absolute values of a tensor Mj nc [. 

Definition 1.4. An s-dimensional tensor M is a s-dimensional array with indexes in the range [n]; that is, 
M has an entry for each i G [n] s . We denote by m\ the i-th entry of M for each i G [n] s . 

Definition 1.5. Let M be a s-dimensional tensor with entries mi, i G [n] s . An L\-norm of M is a \M\ = 

Sie[n] s l m il- 

For example, a 1-dimensional tensor is an n-dimensional vector, a 2-dimensional tensor is an n x n- 
matrix and so forth. 
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Many streaming problems address explicitly denned vectors (or matrices) where entries are equal to fre- 
quencies of corresponding stream elements. The Independence problem diverges from this setting; e.g., for 
pairwise independence, a pair (i, j) affects all entries in z-th row and j-th column of the product probability 
matrix. To reflect this important difference we consider the case where the entries of a tensor are defined 
implicitly by a data stream. 

Definition 1.6. Let D be a collection of data streams of size m of elements from domain VL. Let T : 
T> x [n] s ^ Rbe a fixed function. We say that s-dimensional tensor M with entries m\ = J-(D, i), i G [n] s 
is implicity defined by J-, given D. We denote an implicitly defined tensor as J~{D). 

Definition 1.7. Let T> be a collection of data streams of size m of k-tuples from domain [n] k . A k-wise 
Independence Function Tj n d : V x [n] k ^ R is a function defined as Ti n d{P,i) = m k f\ — Yii=i fl(k) 
for i 6 [n\ k . Here /; is given by Definition li.il Statistical distance tensor Mj n( i is a k-dimensional tensor 
implicitly defined by J-j nc i, i-e., Mj nc [ = Tj n( i{D). 

The main objective of our paper is approximating |M/ n( ^|. In particular, this implies solving the Inde- 
pendence problem since A(Pj i nt , P pro duct) = ^k\Mj n d\, and since m = \D\ can be computed precisely. 
We thus freely interchange the notions of the independence problem and computing \Mj nc i\. In fact, our 
approach is applicable to any function T for which conditions of our main theorems are true. 

1.3 Why Existing Methods for Estimating L 1 Do Not Work 

Alon, Matias and Szegedy [5] initiated the study of computing norms of vectors defined by a data stream. 
In their setting vector entries are defined by frequencies of the corresponding elements in the stream. Their 
influential paper was followed by a sequence of exciting results including, among many others, works by 
Bhuvanagiri, Ganguly, Kesh and Saha lfl4l : Charikar, Chen and Farach-Colton iflTl : Cormode and Muthukr- 
ishnan ||2T1 l22l ; Feigenbaum, Kannan, Strauss and Viswanathan [26] ; Ganguly and Cormode ||29l ; Indyk 
EH ; Indyk and Woodruff E3; and Li [53[36]]. 

There is an important difference between settings of Q and the Independence problem. Indeed, while 
the entries of the independence tensor are defined by frequencies of tuples, there is no linear dependence. 
As a result, the aforementioned algorithms are not directly applicable to the Independence problem. 
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To illustrate this point, consider the celebrated method of stable distributions by Indyk OTTl . For L\ 
norm, Indyk observed that a polylogarithmic (in terms of n and m) number of sketches of the form ^ CiVi 
gives an (1 ± e)-approximation of | V\, when Cj are independent random variables with Cauchy distribution. 
Let us discuss the applicability of this method to the problem of pairwise independence. A sketch J2i C\m\, 
i G [n] 2 , would solve this problem; unfortunately, it is not clear how to construct a sketch in this form. 
In particular, the probability matrix of the product distribution is given implicitly as two vectors of margin 
sketches. It is not hard to construct sketches for margin distributions; however, it is not at all clear how 
to obtain a sketch for product distribution without using a multiplication of margin sketches. On the other 
hand, if we do use a multiplication of margin sketches (this is the approach of Indyk and McGregor), the 
random variable that is associated with the tensor's elements is a product of independent Cauchy variables. 
Therefore, random variables for distinct entries are not independent, and thus typical arguments used for 
stable distribution methods do not work anymore. In fact, the main focus of the Indyk and McGregor 
analysis is to overcome this problem: 

"Perhaps ironically, the biggest technical challenges that arise relate to ensuring that different 
components of our estimates are sufficiently independent." 

For pairwise independence, Indyk and McGregor use the product of two Cauchy variables, where one of 
them is "truncated." Using elegant observations, they show that such a sketch allows achieving logn- 
approximation of the statistical distance. Unfortunately, it is not clear how the method of a Cauchy product 
can be improved at all, since the log n factor is a necessary component of their seemingly tight analysis. 

1.4 A Description of Our Approach 

As we discuss below, solving the Independence problem requires developing multiple new tools and using 
them jointly with known methods. 

Dimension Reduction for Implicit Tensors. Our solution can be logically divided into three steps which are 
explained, informally, below. 

First, we prove that given a poZyZog-approximation algorithm for fc-dimensional tensors and an e- 
approximation algorithm for a special type of (k — 1) -dimensional tensors, it is possible to derive an e- 

6 



approximation algorithm on /c-dimensional tensors, where the resulting algorithm increases memory bound 
by a factor 0((§ log Thus, we can trade dimensionality and precision for memory. To illustrate 

this step, consider pairwise independence. There exist an e-approximation algorithm on vectors [31] and 
a log n-approximation algorithm on matrices ll32l . We show that these algorithms can be used to obtain 
an e-approximation algorithm on matrices. This informal idea is stated precisely as Dimension Reduction 
Theorem 12.11 This theorem is the main technical contribution of our paper; the majority of the paper is 
devoted to establishing its validity. 

Second, given a po/y/og-approximation algorithm for ^-dimensional tensors and an e-approximation 
algorithm on vectors, we can derive an e-approximation algorithm on fc-dimensional tensors by applying 
the Dimension Reduction Theorem recursively fc-times. The memory will be increased by a factor roughly 
0({\ log ™p)( 30 + fc ) fc ) which is 0({\ log for constant k. This informal idea is stated precisely as 

Theorem 12.21 

Third, we show that the conditions for Theorem 12.21 hold for the Independence problem. These results 
are stated in Lemmas 12.41 and [231 and in fact are a generalization of results from |3ll[32). Section [6] is 
devoted to the proof of these lemmas. 

The rest of our discussion is devoted to a description of the main ideas behind the Dimension Reduction 
Theorem. 

Hyperplanes and Absolute Vectors. Consider a matrix M; a very natural idea to approximate \M\ is by 
approximating a L\ norm of a vector with entries equal to L\ norms of rows of M. We generalize this idea 
to tensors by defining the following operators. 

Definition 1.8. For any s,t > 0, we denote by (,) a mapping from [n] s x [n] 1 to [n] s+t obtained by 
concatenation of coordinates. For instance, ((1,2), 3) is a an element from [n] 3 with coordinates 1,2,3 
respectfully. 

Definition 1.9. Let M be a s-dimensional tensor with entries rrij , j £ [n] s . For any I E [n], 
Hyperplane(M, I) is a (s — 1) -dimensional tensor with entries rny^for i £ [ra] s . 

For example, when k = 2, the Z-th hyperplane of a matrix M is its Z-th row. 
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Definition 1.10. An l-th hyperplane is a-significant if \Hyperplane(M , l)\ > a|M|. 

For example, when k = 2, the l-th row is a-significant if the Li-norm of the vector defined by the l-th row 
carries at least a-fraction of \M\. 

Definition 1.11. For a s-dimensional tensor M, an AbsoluteV ector(M) is a vector of dimensionality n 
with entries \Hyperplane(M,l)\,l £ [n]. In particular, \ AbsoluteV ector(M)\ = \M\. 

Projected Dimensions. To prove Dimension Reduction Theorem 12. II we need to map s-dimensional tensors 
to (s — 1) -dimensional tensors with a small distortion of L\. We come up with the following mapping. 

Definition 1.12. Let M he a s-dimensional tensor with entries m\, where 1 € [n] s , and let < t < s. A 
Suffix-Sum tensor Tt{M) is a (s — t)-dimensional tensor with entries ( for each i 6 

m 'i = £ m (j,i) 
je[n]« 

Also, we define Tq(M) = M. In other words, the i-th entry ofTt(M) is obtained by summing all elements 
of M with the (s — t)-suffix equal to i. In particular, T S (M) is a scalar that is equal to Xae[n] s m i- 

For matrix M with entries rriij, the Suffix-Sum operator T\(M) defines a vector V with entries Vj = 
Yl,i m i,j- m other words, all entries of M that belong to the same columns (i.e., have the same second 
coordinate, i.e., the same "suffix") are "summed-up" to generate a single entry of V. In some sense, the 
Suffix-Sum operator is orthogonal to the Absolute Vector operator. In the latter case we sum up the absolute 
values that belong to the same hyperplane, i.e., have identical prefix; in the former case we sum up all 
elements (and not their absolute values) that have an identical suffix. 

Clearly |Ti(M)| < \M\; however, it is possible in general that |Ti(M)| <C \M\. The key observation is 
that in some cases |Ti(M)| ~ \M\ and thus we can use an approximation of |Ti(M)| to approximate \M\. 
To illustrate this point, consider a matrix M with entries rriij that contains a very "significant" row i (i.e., 
J2j \ m i,j\ ~ I^D- The key observation is that in this case |Ti(M)| ~ |M|; thus, if there is a significant 
row, it can approximated using |Ti(M)|. The same idea is easily generalized: if a s-dimensional tensor 
M contains a (1 — e)-signincant hyperplane Hyperplane(M,l), then |Ti(M)| is an 2e-approximation of 
\Hyperplane(M, We prove this statement in Fact l3.6l 
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Note that T\(M) is a (s — 1) -dimensional tensor; if M is a matrix, then T\(M) is a vector for which we 
can apply methods from OTTl . Thus, approximating |Ti(M)| is potentially an easier problem. 

Certifying Tournaments. We have shown that T\(M) can be useful for approximating \M\. However, when 
can we rely on the value of |Ti(M)|? In particular, how can we distinguish between the cases when there 
is a heavy hyperplane (and thus | T\ (M) | is a good approximation) and the case when there is no heavy 
hyperplane (and thus |Ti(M)| does not contain reliable information)? The second key observation is that it 
can be done using "certifying tournaments." To illustrate this point, consider again the case k = 2, where 
M is a matrix. Split M into two random sub-matrices by sampling the rows w.p. 1/2. If there is a heavy 
row, then with probability close to 1, one sub-matrix will have a significantly larger norm then the other. 
Recall that the method of ll32l gives us a log n-approximation. Thus, for very heavy rows, the ratio between 
approximations of norms obtained by the method from [32 ] will be large. On the other hand, we show that if 
there are no heavy rows, then such behavior is quite unlikely to be observed many times. Thus, there exists 
a way to distinguish between the first and the second cases for (1 — log £ s ra ) -significant rows 

The method of certifying tournaments can be generalized to any s < k as follows. Let M be a s- 
dimensional tensor with entries m\ for i £ [n] s . We "split" M into two "sampled" s-dimensional tensors 
M° and M 1 by randomly sampling the first coordinate. That is, M has entries m\H(i\) and M° has 
entries rrti(l — iJ(ix)), where H : [n] h-> {0, 1} is pairwise independent and uniform. If there exists a 
/3-approximation algorithm for sampled tensors, and there exists an e-approximation algorithm for Suffix- 
Sum, |Ti(M°)| and |Ti(M°)|, then we can approximate L\ norm of significant hyperplanes. Indeed, if 
there exists a significant hyperplane M; of M, then the ratio between /3-approximations of \M°\ and \M l \ 
will be large. If this is the case, the approximation of T{M H ^) is also an e-approximation of \M\\. 

To summarize, our main technical Theorem 14. 3l proves that it is possible to output a number U such that 
U is either an approximation of some hyperplane or 0. Further, if there exists a (1 — -m) -significant hyper- 
plane, then with high probability, U is its approximation. We call such an algorithm an a-ThresholdMax 
algorithm, for a = 0{-^). 

Indirect Sampling. Many streaming algorithms compute statistics on sampled streams, which are random 



2 It is worth noting that the idea of "split-and-compare" is not new. Group testing 1221 exploits a similar approach. However, the 
methods from 1221 require e-approximation of L\ ; in contrast, we use certifying tournaments to improve the approximation. 



subsets of D defined by some randomness TC. In many cases, a sampled stream directly corresponds to a 
collection of sampled entries of a frequency vector. In contrast, subsets of D do not correspond directly to 
entries Mi n ^. Thus, our algorithms employ indirect sampling, where randomness defines sampled entries 
of Mj nc i rather then the entries of a data stream D. We define a Prefix-Zero operator. 

Definition 1.13. Let M be a s-dimensional tensor with entries m\, i € [n] s and let Hi, . . . ,Ht,t < s be 
hash functions Hj : [n] ^ {0, 1}. A Prefix-Zero tensor W(M, Hi, . . . , H t ) a is a s-dimensional tensor 
with entries m- x Y\i=i 

Our algorithms work with tensors that are defined by compositions of Ti n d, Prefix-Zero and Suffix-Sum. 
We thus extend the definition of implicitly defined tensors. 

Definition 11.61 (Revised) Let T> be a collection of data streams of size m of elements from domain £1 and 
let f)be a collection of hash functions from [n] to {0, 1}. Let T : T> x x [n] s ^ R be a fixed function, for 
some < t < s. We say that a s-dimensional tensor M with entries m\ = T{D, 7Y, i), i £ [n] s is implicity 
defined by T, given D £ V and TC £ S) f . We denote an implicitly defined tensor as T{D, TC). 

Example 1.14. Consider k = 2. Then J-'(D,H) = W (J~i n d(D) , H) defines a matrix that represents a 
collection of rows sampled by a hash function H : [n] \— > {0, 1}. 

Generalizing the Method oflndyk and Woodruff [33] to Work on Implicit Vectors. The ThresholdMax algo- 
rithm solves the problem that resembles the well-known problem of finding an element with maximal fre- 
quency, see, e.g., iflTl and EQ. The celebrated method of Indyk and Woodruff [33] uses maximal en- 
tries to estimate L p norms on vectors defined by frequencies. We apply the ideas of ll33l to approximate 
\AbsoluteVector(M)\ = \M\. 

Unfortunately, the method of Indyk and Woodruff ll33l is not directly applicable since some basic tools 
available for frequency vectors (such as L2 norm approximation) cannot be used. We propose a different 
algorithm which is still in the same spirit as [ 33 ] ; it can be found in Section [5] We prove Lemmas 15.51 and 
I5.3l which state that an existence of an a-ThresholdMax algorithm for an implicitly defined vector V implies 
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an existence of an (e, 5) -approximation algorithm for | V|, with memory increased by an additional factor of 

±poly{\ log *f). 

Other Technical Issues . There are several other rather technical issues that need to be resolved. We need to 
prove that the methods of Indyk [31] and Indyk and McGregor [32] are applicable for fc-dimensional tensors 
that are obtained from Mj n d by applying Prefix-Zero and Suffix-Sum operators. The proofs can be found in 
Section[6] To prove our main theorems, certain properties of the operations on tensors should be established. 
We prove these in Section [3] 

1.5 Related Work 

Measuring pairwise independence between two or more random variables is a fundamental problem that 
touches many areas of computer science. The problems of efficiently testing pairwise, or fc-wise, inde- 
pendence were recently considered by Alon, Andoni, Kaufman, Matulef, Rubinfeld and Xie [2[; Alon, 
Goldreich and Mansour [4]; Batu, Fortnow, Fischer, Kumar, Rubinfeld and White [10]; and Batu, Kumar 
and Rubinfeld lTT2l . They addressed the problem of minimizing the number of samples needed to obtain 
sufficient approximation, when the joint distribution is accessible through a sampling procedure. Unlike the 
work in [f2l[4l[T0l[T2l, in the streaming model, the joint distribution is given by a stream of tuples. 

Many exciting results have been reported in the streaming model, including, for example, Alon, Duffield, 
Lund and Thorup Q; Alon, Matias and Szegedy (H; Bagchi, Chaudhary, Eppstein and Goodrich |9j; 
Bar-Yossef, Jayram, Kumar and Sivakumar [7]; Bar-Yossef, Kumar and Sivakumar (H; Beame, Jayram 
and Rudra lfl3l : Bhuvanagiri, Ganguly, Kesh and Saha [14]; Chakrabarti, Khot and Sun llT6l ; Charikar, 
OCallaghan and Panigrahy fT8l ; Coppersmith and Kumar |fT9l ; Cormode, Datar, Indyk and Muthukrishnan 
[20]; Datar, Immorlica, Indyk, and Mirrokni [23 ]; Duffield, Lund and Thorup [24J; Feigenbaum, Kannan, 
McGregor, Suri and Zhang (23; Gal and Gopalan (27); Ganguly (281; Indyk EQ; Indyk and McGregor 
11321 ; Indyk and Woodruff E3; Mitzenmacher and Vadhan El; Sun and Woodruff EH; and Szegedy l43l . 
For a detailed discussion of the streaming model, we refer readers to the excellent surveys of Aggarwal (ed.) 
[ 1 ]; Babcock, Babu, Datar, Motwani and Widom @; and Muthukrishnan ll38l . 

In our recent work, Ifl5l . we also address the problem of A;- wise independence for data stream. In 
contrast to the current paper, in lfT5l we study the L2 norm and use entirely different techniques. 
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1.6 Roadmap 

Section [2] describes the main theorems of the paper. In Section[3]we show some useful properties of Suffix- 
Sum and Prefix-Zero. Section [4] contains proof of the Tournament algorithm. Section [5J contains a general- 
ization of the ideas of Indyk and Woodruff ll33l to implicit vectors. Finally, Section [6] generalizes methods 
of Indyk [31 ] and Indyk and McGregor [32] to work with sampled portions of Mj n d- 

2 Main Theorems 

The proof of our result is based on three main steps which are summarized by the following theorems. The 
remainder of this paper is devoted to establishing these theorems. 

Theorem 2.1. Dimension Reduction for Implicit Tensors 

Let s > 1 and let M be a s-dimensional tensor with poly(n, m)-bounded entries that is defined by a 
function T, i.e., M = J-(D, 7i) where D is a data stream and H is a fixed randomness. Let H : [n] t— > {0, 1} 
be an arbitrary fixed hash function. Assume that 

1. There exists an algorithm 21(1?, 7i,H,5) that, given D and an access to H and H, in one pass obtains 
(log fc (n), 5)-approximation of\W(M, H)\; 

2. There exists an algorithm 53 (-D, TC, H, e, 5) that, given D and an access to H and H, in one pass 
obtains an (e, 5) -approximation of \T\ (W(M, H))\; 

3. Both algorithm require memory v(n, m, e, 5) < 0((- log ™p^ 30+fe ) ^ beyond the memory required 
for H and 7i. 

Then there exists an algorithm that in one pass obtains an (e, 5)-approximation of \M\ using memory 

(I log rm,)(30 +fc )*+i_ 

Proof. Follows from Theorem [431 Lemma [531 Lemma [531 and elementary computations. 

Indeed, the assumptions of Theorem [2] imply, by Theorem l4.3l an existence of a log 2fc -ThresholdMax 
algorithm (see Definition 14.21 ) for restricted function T' = AbsoluteVector(J r (D,H)). The existence 
of a ThresholdMax algorithm implies, by Lemma [531 the existence of a Cover algorithm (see Definition 
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for AbsoluteVector(J T (D,H)). The assumption that the entries of M are polynomially bounded and 
Fact 13.71 imply that the entries of AbsoluteV ector(!F{D ,TL)) are polynomially bounded as well. Thus, 
by Lemma 1531 there exists an (e, ^-approximation algorithm for \AbsoluteVector(J 7 (D, TL))\. Finally, 
{AbsoluteVector^iD^)^ = £ i6[n] {Hyperplane^iD^),^ = \M\. 

After substituting the parameters, the memory required is less than (for sufficiently large ri) 



1 1 e 7 e 17 

-^log(-)log 2A;+2C \nm)v(n,m,—^ -) < 

e dU o log 4 (nm) log (n)log 8 (mn) 



1 nm\ 

7 log — J 



(30+fe) s 



□ 



Theorem 2.2. Approximation Theorem for Tensors 

Let M be a k-dimensional tensor with entries bounded by poly(n, m) and implicitly defined by a func- 
tion J~{D). Assume that 

1. There exists an algorithm *B S (D, H\, . . . , H s ) (for some s < k) that, given D and an access to fixed 
hash functions H\, . . . , H s , in one pass obtains an (e, 5)-approximation of\T s (W(M, Hi, . . . , H s ))\; 

2. There exist algorithms 2l SljS2 (-D, H±, . . . ,H S1 ) (for any < S2 < s\ < s) that, given D and an 
access to H{S, in one pass obtain a (\og k (n) , S)-approximation of\T S2 (W(M, Hi, . . . , H Sl ))\; 

3. All algorithms use memory bounded by 0( (7 log ^) 2 °), beyond the memory required for His. 

Then there exists an algorithm that in one pass obtains an (e, 5) -approximation of \M\ using memory 
0({\\ og nf) m+k)k ). 



(30 I h~)^~ x 

Proof. Define g(x) = (Mog^) First, we show that for any si < s there exists an algorithm 

53 Sl (D, Hi, ... , H Sl ) that gives an (e, 5) -approximation of \T 8l (W(M, Hi, ... , H 8l ))\ and uses memory 
at most g(si). 

We prove this fact by induction on s\. For si = s, the fact follows from the first assumption of Theorem 
since g(s) > (Mog^) 20 . For Si < s, denote F'(D, H 1 ,...,H S1 ) = T S1 (W(F(D), H x , . . . , H 8l ). 
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Denote M' = T'{D, Hi, ... , H Sl ) and let H be an arbitrary hash function. By Corollary 13.41 

W(M',H)=W(T'(D,H 1 ,...,H S1 ),H) = (1) 

W{T Sl {W{F{D),H u . ..,H Sl ),H) = T Sl (W(M, H u . . . , H S1 , H)). 

Thus, and by the second assumption of the theorem, there exists an algorithm that in one pass 

obtains a (log fe (n), <5)-approximation of \W(M' , H)\ using memory less than or equal to g(s\ + 1). 
Also, by Corollary [33] and by (pQ): 

Ti(W(M', H)) = Tx(T Sl (W(M,H l , . . .,H Sl ,H))) = T Sl+1 (W(M, H x , ...,H S1 ,H)). (2) 

By induction, there exists an algorithm that gives an (e, 5) -approximation of 
\T S1+1 (W(M, H 1 ,...,H S ,,H))\ = \T X {W{M',H))\ using memory g( Sl + 1). 

M' is implicitly defined by a fixed function T'{D, H x , . . . , H s ). By Fact l3.7[ its entries are polynomially 
bounded. Thus, by ([T]) and ((2J), all assumptions of Theorem 12. 1 1 are satisfied for M' . Therefore, there exists 
an algorithm that gives an e-approximation of \M'\ = \T Sl (W(M, Hi, ... , H Sl )) \ using memory g(s x ). 

In particular, there exists an algorithm that for any H gives an e-approximation of \T\(W(M, H))\ 
using g(l). Also, by the second assumption of the theorem, there exists an algorithm that gives a log fc (n)- 
approximation of \Tq(W(M, H))\ = \W(M, H)\. Thus, we can apply Theorem 12. II for M and obtain an 
e-approximation of |M|. The resulting memory usage will be 0( f- log ). □ 

The following lemmas are proven in Section [6l 

Lemma 2.3. There exists an algorithm %$k-i that, given a data stream D and an access to hash functions 
H\, . . . , Hk-i, in one pass obtains an e-approximation of \T/,_i(W (Mj nc i, Hi, . . . , Hk_i))\ using memory 
O(^logilog^). 

Lemma 2.4. There exists an algorithm 2t SljS2 (for any < S2 < si < k) that, given a data 
stream D and an access to hash functions Hi, . . . , H Sl , in one pass obtains a log k n-approximation of 
\T S2 (W(M Ind , Hi,...,H Sl ))\ using memory 0(log (nm) log |). 
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Theorem 2.5. Main Theorem Let k > 2 be a constant, and let D be a stream of k-tuples from [n] k . 
For any < e < 1, there exists an algorithm that makes a single pass over D and returns an (e, 5)- 



approximation of the statistical distance between product and joint distribution (see Definition U . 1\ ) using 
memory 0{{\ log(^f)) (30+fe)fe ). 



Proof. By Lemma 1231 and Lemma 12741 the algorithms required by Theorem 12.21 exist for Mj n ^. Also, by 
Fact 13.71 the entries of Mj n d are polynomially bounded. Thus all assumptions of Theorem 12.21 are true for 
Mind- Applying Theorem [2721 to M/ n£ j, we obtain the main result. □ 

3 Properties of Tensors 

We prove the following useful facts about Suffix-Sum and Prefix-Zero operations. 

Fact 3.1. Let M be a t-dimensional tensor and < s < t. Then 

W(T S (M),H) = T S (W(M, H 1 = 1,...,H 9 = 1, H)). 

Proof. Denote by m w (for w G [n]*) the w-th entry of M. For any i G denote by a\ the entry of 

T S (M). By Definition [TH 

a i = Yl m (J.i)- 

j6[n] s 

Denote by h the entry of W(T S (M),H). By Definitions ELI and Q7J3J 

bi = -H"(ii)<2i = ^2 m (j,i) H (^)- 
Denote by q the i-th entry of T S (W(M, H 1 = 1,...,H S = 1,H)). By Definitions Q7J2] and [T7T3J 

Thus, for any i, bi = c\ and the fact is correct. □ 
Fact 3.2. Let M be a t-dimensional tensor and let < s < t. Then Ti(T s (M)) = T s+ i(M). 

Proof. Denote by m w (for w G [n]*) the w-th entry of M. For j G [n]* _s denote 6j to be an entry of T S (M). 
By Definition [T7f2l 

h = Yl m (-j)- 

ue[n] s 
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For every i G [re]' s 1 , denote by c\ the entry of T\{T S (M)). By Definition ! \.\2\ 

ie[n] 2e[n]ue[n] s Ze[n.]u6[n] s v6[ra] s+1 

For any i € [n]* _s_1 denote by a\ the entry of T S+ \(M). By Definition ! 1.121 

fli = ^2 m( Vji ). 

ve[n]"+ 1 

Thus, for any i, ai = Ci and the fact is correct. □ 

Fact 3.3. Let M be a t-dimensional tensor, let s < t and let H\, . . . , H s and G%, . . . ,G S be hush functions. 
Then 

W(M, H X G U H S G S ) = W(W(M, H u . . . , H s ), G u . . . , G s )) 

Corollary 3.4. Let M be a t-dimensional tensor and let < s < t. Let M' = T S (W(M, Hi,... , H s )). 
Then 

W(M',H) = T S (W(M,H 1 ,...,H S ,H)). 
Proof. Denote M" = W(M, Hi,..., H s ). Then by FactED 

W(M', H) = W(T S (M"), H) = T S {W{M", G x = 1, . . . , G k = 1, H)). 
Also by Fact[33J 

W{M",G 1 ,...,G S ,H) = W{W{M,H 1 ,...,H S ,1),G 1 ,...,G S ,H) = W{M,H 1 ,...,H S ,H). 

□ 

Corollary 3.5. Let M be a t-dimensional tensor and let < s < t. Let M' = T S (W(M, Hi, . . . , H s )). 
Then 

T±(M', H)) = T S+1 {W(M, H 1 ,...,H S ,H)). 
Proof By Fact E2 and Corollary ED 

T S+1 (W(M, H X ,...,H S ,H)) = T X (T S {W{M, H u ...,H a , H))) = T X (W{M', H)). 

□ 
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Fact 3.6. Let M be an arbitrary s-dimensional tensor, let Mi be (1 — e/ 2) -significant hyperplane of M, 
Mi = Hyperplane(M , I), and let M' = T\(M). Then \M'\ is an e-approximation of\Mi\. 

Proof. We have 

\ M '\= E iE m o',i)i^ E EK-.i)i = w< r^7j2 m ^( 1+e )i M <i- 

ieH 5 " 1 j£[n] ie[n] s -! je[n] 

On the other hand, 

\ M '\= E iE m a-i)i^ E (K,i)i- E K-,i)i) = i^i -d M i-i^i) = 

ie[n] s -! j£[n] ig[n] s -! je[n],j^l 

>(2- T ^ 7 -)\M l \>(l-e)\M l \. 

□ 

Fact 3.7. The following is correct: 

1. Let M be a s-dimensional tensor with polynomially bounded (in n and m) entries for s < k. Let M' 
be a tensor obtained from M by an arbitrary composition of Prefix-Zero, AbsoluteVector, Hyperplane 
and Suffix-Sum operators. Then the entries of M 1 are polynomially bounded. 

2. All entries of Mj n d are integers with absolute values bounded by 2m k and thus claim 1 is true for 
M Ind . 

Proof. The first claim follows from the fact that the entries of M' are sums of disjoint subsets of M and that 
the number of entries in M is bounded by n k . The second claim follows from Definition 1 1.7 1 □ 
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4 Certifying Tournaments 



Algorithm 4.1. TensorTournament( D , Ti, H, e) 

1. Repeat in parallel 0(— ^-) rimes where p = 1 — yjl — e/2. 

(a) Generate 2-wise independent random hash function Z from [n] to {0, 1} smc/j friaf = 
w.p. 0.5. Denote Zi = HZ, Z = H(l - Z). 

(b) Compute in a single pass over D for i = 0, 1: tj = 21(1?, H, Zj, e, 5'), where 5' = pe t . 

4 log j 

fcj Simultaneously (in the same pass), compute [j = 25(1),?^, Zj, 5'). 
(d) Put m = max{-|, ti, 0}, i = 0, 1. 

fej Define A' = (1 + e)A, w/zere A « the constant from Lemma W4\ A = (1 + 1 ^^^(/ 4 ). 
f/J Compute 

{ui, i/ui > A'/3 2 u , 
uo, ^u >A'/5 2 ui, 
0, otherwise. 

2. Output U to be the minimum of all U's. 



Definition 4.2. Let J 7 be a fixed function that defines implicit vectors, given a data stream and a fixed 
randomness and denote V = J-(D,7i) as a vector with entries V{. For a > 0.5, an a-ThresholdMax 
algorithm for restricted T is an algorithm that receives as an input a data stream D and an access to a 
randomness Ti and a random function H : [n] t— > {0, 1}, and in one pass over D returns U > such that 
w.p. at least 1 — 5: 

1. IfU>0 then U is an e- approximation of\r>i\for some i with H(i) = 1. 

2. TjO \ VH\ > and there exists i such that H(i) = 1 and > (1 — a)|VJJ| then U is an e- 
approximation of\vi\. 

Theorem 4.3. Let H be a fixed hash function defined as above and let e < 0.1. Let M be a s-dimensional 
tensor implicitly defined by a fixed function T, stream D and randomness Ji, M = T(D, Ti.). If there exist: 

• An algorithm 2l(Z), Ti, H, 5) that in one pass obtains ((5, 8)-approximation of\W(M, H)\ using mem- 
ory /xi(n, m, e, 8); 



Here and thenceforth we denote by VH a vector with entries ViH(i), i g [n] 
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• An algorithm 53(1), H,e,S) that in one pass over D obtains an (e,5)-approximation of 
\Ti(W(M, H))\ using memory fj,2(n,m,e,S); 

Let a = gfm- Then the Algorithm TensorTournament(D , H, H, e) is an a-ThresholdMax algorithm for 
restricted T' (see Definition 14. 2D , where ^'(D, Ti) = AbsoluteVector(J r (D , %)). The algorithm makes a 
single pass over D and uses memory 
1 1 

Of- log T (/xi(n,m, e/3,<5e/log (1/5)) + /j, 2 {n,m, e/3, 5e/log (1/5)) + log nm). 
e o 

Proof. 

Denote M l = W(M,Z t ) for £ = 0,1. Let M» = Hyperplane(M,i) for i € [n] and let F' to be a 
vector with elements |Mj|. By Definition II. Ill V = T'(p,'H). Further, let V be a vector with entries 
Uj = |Mj|ff(i). We prove that the algorithm satisfies two conditions of Definition [42] for the ThresholdMax 
algorithm for V and H. 

Proof of the first condition of Definition \4.2\ 

We prove the following stronger statements which imply the first condition of Definition 14. 21 
I. If there is no (1 — e)-significant entry v\ then, w.p. at least 1 — |, U = 0. 

II. If \V\ > and there is a (1 — e) -significant entry v\ then, w.p. at least 1 — |, either U = or U is a 
3e-approximation of | v\ \ . 

Proof of statement I 

By definitions of <B,2l, we have w.p. at least 1 - 86' for t = 0, 1: u t > jf > and i t < (1 + 

e)|Ti(M*,#)| < (l + e)|M*|;and| < |M'|. Thus, 

^<«t<(l + e)|Af*|. (3) 

Following the terminology of Lemma l4.4l we define X = \M l \ and Y = \M°\. We have the following 
relations: 

l^l=E^ = E F «l M «l = E^) E \m {i , y) \ = \W(M,H)\, 

i i ie[n] j'e[n] a_1 
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X = \M 1 \ = Z<j 1 )H(j 1 )\m i \ = Y,Z(?)H(i) E Ko')l = E ^(0^(01^1 = E Z ® v *> 
jeM s »G[n] j'eln]* -1 i i 

and similarly 

Y = |M°| = E(l - Z(i))tf (i)|Mi| = |F| - X = |V| - |Af x |. (4) 

i 

By statement I, for all i, Vi < (1 — e)| V|. Thus we can apply Lemma l4~4l We have: 

P((|M°| > AjM 1 !) U ((M 1 ! > A|M°|)) = P((X > AY) U (Y > XX)) < ^JY^l. 

Let T be the event (u > A'/3 2 ui)U(ui > A'/3 2 u ). Let $ be the event that ^ <u t < (l+e)|M*| for both 
values of t. We have P(T) < P(T, $) + P(§). By ©, we have P(§) < 85'. Also, events u > A'/3 2 Ui 
and $ imply that |M°| > AjM 1 ]; indeed: 

| M °| > > — /3 2 m > AIM 1 ]. 

1 1 " (1 + e) " l + e ~ 1 1 

Thus we have 

P(T,$) < P((|M°| > AIM 1 ]) U (|M X | > A|M°|)) < s/T^l. 
We summarize that if no (1 — e)-significant Vi exists, then 

P(XJ' / 0) < P(T) < ^/T^ + 0(5') < y/1 - e/2. 

Recall that the number of repetitions is 0(|logl/<5), where p = 1 — \/l — e/2. Thus P(t/ 7^ 0) < 
(l-p)? lo sf < f. 

Proof of statement II 

Let vi be a (1 — e)-significant entry of V . Assume, w.l.o.g., that for one execution of the main cycle of 
the Tournament algorithm, Z(l) = 0. Statement II implies \V\ > which implies v\ = \Mi\H(l) > 
which implies (1 - Z{l))H(l) = 1. Thus, \Hyperplane(M° ,l)\ = \Hyperplane(W(M, (1- Z)H),l)\ = 
\Mi\ = v h Therefore by (@]), \Hyperplane(M° ,l)\ = vi > (1 - e)\V\ > (1 - e)|M°|, i.e., the l-th 
hyperplane of M° is (1 — e)-significant. By Fact 13.61 |T(M°)| is an 2e-approximation of \M\\. By the 
assumptions of the theorem, <B returns an e-approximation of |T(M°)|. Thus, to is a 3e-approximation of 
|M;|, w.p. at least 1 — 5', in which case 

u > to > (l-3e)|M|. 
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Also, by the assumption of Theorem 14.3 1 w.p. at least 1 — 5', we have | < |M°|. Thus 
u =max{|, t , 0} <max{|M°|, (l + 3e)|M,|} < (l + 3e)|M,|. 
On the other hand, w.p. at least 1 — 25' 

ui = max{-i,ti,0} < max^M 1 ], (1 + e)\M l \} = (1 +e)|M 1 |. 

P 

But since Z s (l) = we have by 

\M l \ = \ V\ - \M°\ < \V\ - \Hyperplane(M°,l)\ = \V\ - vi < ^-^|M;|. 

Combining all of the above computations, we conclude that w.p. at least 1 — 45' (for sufficiently small e, 
e.g., e < 0.1): 

ui < (l + e)|M!| < ^^|M,| < ^_l^±£L_ Uo < x'uo. 

Thus, U' is equal to either or uo w.p. at least 1 — 45' . Recall simultaneously uo is a 3e-approximation of 
\Mi\ = V[. The same inequality is true if Z(l) = 1. By union bound, w.p. at least l-0(-^<5') = l-fi(<5), 
U is either or a 3e-approximation of v\. 



Proof of the second condition of Definition \4.2\ 

Finally, consider the case when vi is a (1 — a) -significant entry of V. Consider the case when Z(l) = 
0. Repeating the arguments from the proof of statement II, we have, w.p. at least 1 — 45', Uq is a 3e- 
approximation of v\ and 

ui < (1 + e^M 1 ! < (1 + e) - - vi < 4av h 

(1 - a) 



Therefore, 



Uq > (1 - 3e)«, > (1 4a 3e) m > A'/? 2 U!. 



Thus, w.p. 1 — 45', U' = Uq = (1 ± 3e)vi. The same is true when Z(l) = 1. Thus, U is a 3e-approximation 
of vi w.p. at least 1 — £1(5). 



Conclusion and memory analysis 
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Since both conditions of Definition 14.21 are met (substituting e with e/3), we conclude that 
TensorTournament is an a-ThresholdMax algorithm for restricted T 1 . Let us count the memory needed 
for a single iteration of the main cycle of the algorithm. To generate pairwise independent Z, we need 
O(logn) bits. In addition, we need fii + ^2 for the algorithms fB and 21 and O(lognm) bits to keep the 
auxiliary variables. Thus, in total we need memory 

Ot- log -(fii(n,m, e/3, 8el log (1/5)) + /x 2 (™, m, e/3, 5e/log (1/5)) + lognm). 
e 

Recall that we do not count memory required to store H and H. □ 

Lemma 4.4. Let V be a n-dimensional vector with non-negative entries Vi > 0, i G [n]. Let Z be 2-wise 
independent random hash functions from [n] to {0, 1}, such that P(Z(i) = 1) = 0.5. Let X = ^iViZ(i), 

and Y = L\(V) — X. If there exists e > such that for all i Vi < (1 — e)Li(V), then for A = A(e) > 

1 , aa-e) 1 / 4 , 

1 + l_(l_e)l/4 We kaVe 

P({X > XY) U(Y> XX)) < VT^l. 
Proof. Clearly, E(X) = L\(V)/2. Further, by 2-wise independency of Z, we have 

E(X>) = E((£ v^)) 2 ) = ~ £ v* + \ £ Vi v 3 = -J2< + E{Xf- 

i i ij^j i 

Thus, by the assumption that Vi < (1 — e)L\(V), we have: 

Var(X) = E(X 2 ) - E(Xf = \ £ v} < ^L 1 (V) 2 . 

i 

Thus, a(X) < ^±Li(V). Note that event X > XY is equivalent to the event X - E(X) > jn^jL^V) 
and event Y > XX is equivalent to the event E(X) - X > 2 (\+i) L i( v )- Tnus 

P((X > XY) U (Y > XX)) = P(\E(X) -X\> ^Z±- Ll (V)) < 
for A > 1 + 1 _ v (1 ^ )l/4 . 
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Note that if there is at most one strictly positive Vi, then P((X > XY) U (Y > \X)) = 1 seems to 
contradict our lemma. However, in this case, there exists Vi, such that vi = L\(V), and thus the assumption 
of the lemma does not hold. Generally, the assumptions imply that there exists at least strictly positive 
entries V{. □ 

5 Approximating L\ Norms of Implicit Vectors 

Definition 5.1. Let V with Vi > be a vector from R n . A setU of positive numbers is an e-cover ofV if: 

1. All elements ofU are e- approximations of distinct and positive coordinates from V. I.e., there is a one- 
to-one mapping pfrom the set U to a subset S' C [n] such that for all U £ U, U is an e- approximation 

ofVpiuy 

2. U contains e- approximations of all e-significant elements ofV. I.e., for all Vi such that Vi > e\V\, it 
is true that i £ S'. 

The size of the cover is \U\. 

Definition 5.2. Let T be a fixed function that implicitly defines vectors, given a data stream D and a fixed 
randomness H. Denote V = ^(D, H). A Cover algorithm for restricted T is an algorithm that receives as 
an input a data stream D, an access to a randomness TC and a random function H : [n] i— > {0, 1} and an e 
and 5. The algorithm makes a single pass over D and w.p. at least 1 — 5, returns an e-cover of vector with 
entries ViH(i). 

5.1 Witnessing e-Significant Hyperplanes 

Lemma 5.3. Let T be a fixed function that implicitly defines vectors, given a data stream D and a fixed 
randomness H. An existence of a-ThresholdMax algorithm for restricted T that uses memory fj,(n, m, e, 5) 
implies an existence of a Cover algorithm for restricted T for any e. The Cover algorithm uses memory 
0(-^(^(n,m,e,5 2 e 2 a) + log ram)). 

Proof. Denote by £ a (D, H, H, e, 5) the existing a-ThresholdMax algorithm for restricted T. 

Using £ Q we construct the following algorithm. Let e' = e 2 <5/3 and q = [p^]. Let G be a pairwise 
independent random hash function from [n] to [q] that is independent of H and H. For s G [q], define 
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function F s as F s (i) = 1 G ^ =S and execute, in parallel for all s, £ a (D,H,HF s ,e,5/g). Let U s be the 
output of s-fh ran of £ Q . The output of our new algorithm is a set of all strictly positive U s . We show below 
that the output is indeed e-cover of V with probability at least 1 — 5. 

Let V = F(D, H) be a vector with entries v,- L and let V s be a vector with entries v Sj i = v(i)F s (i). By the 
union bounds and by the definition of a-ThresholdMax algorithm, w.p. at least 1 — 6, every positive U s is 
an e approximation of \vi s | for some i s with H(i s )F s (i s ) = 1. But this implies that U s is an approximation 
of \vi\ with H(vi) = 1. Since G splits [n] into disjoint subsets, the output of our algorithm corresponds to 
e-approximations of absolute values of a set of distinct entries of V. I.e., the first condition of e-cover is 
correct. 

To show that the second condition is true as well, let S e be set of all is such that \ viH(i)\ > e\VH\ > 0. 
Consider a fixed i G S e . Let 

X t = \VHF G(i) \ - \ Vi \ = \ Vj \H(j)F G{i) (j) > 0. 

By pairwise independency of G: 

EiX,) = £ \ Vj \H(j)P(G(j) = G(i)) < 

Let be the event that X, > -^r\VH\; by Markov inequality P(^i) < j. Note that if does not happen, 
then 

\VHF G[i) \ - \ Vi \ < -^—;\VH\ < ^{vil < a\ Vi \, 
y ' ge ge 

in which case \v{\ > (1 — a)\VHF G ^\. Let T/ be the event that U G ^ is not an e-approximation of \vi\. By 
the properties of algorithm £ a , P(Tj|*j) < |. Thus 

P(T i )<P(T i \* i )+P(* i )<- + -. 

g 6 

Finally, let be the event where there is a collision between i and j. By pairwise independence of G, 
P(&ij) = |, and thus the probability of collisions for e-significant entries is bounded by Thus, the 
probability that the output of the algorithm does not meet the second condition of e-cover is bounded by 

P({U teS ^ l ) U (Uijes^ij)) ^J e + ^2 + ^2^ 6 - 

□ 
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5.2 The e-Approximation 

Definition 5.4. Let T be a fixed function that defines an implicit vector V = J-(D,TC), given D and a 
randomness H, as in Definition \1.6\ An algorithm that receives as an input a data stream D and an access 
to a randomness TL and in one pass over D returns an (e, 5) -approximation of {^(0,71)1 is called an 
(e, 5) -approximation algorithm for Li(T). 

The main goal of this section is to prove 

Lemma 5.5. Let T he a fixed function that defines an implicit vector V = J-(D,7i), given D and a ran- 
domness 7i. Assume that V has non-negative entries bounded by poly(n, m). Then the existence of Cover 
algorithm Q.(D, Ti., H, e, 5) for restricted T (see Definition 15.21 ) that uses memory /i(n, m, e, 5) implies an 
existence of an (e, 2/ '3) -approximation algorithm for Li(T) (Definition 15.41 ) that uses memory 

\e log (nm) log(nm) e z J 

5.2.1 Notations 

In this section, let < e < 1 be a constant, Define 

16 

a = 0(log (1+e) n), b = O (log (1+e) nm), x' = 10(a+b), X=\-^x'~\, 

fl-r^l, C = d + £ ) W -1, c = mi„ { ^,^). 

For x > x, let f x ( x ) ^ e an integer sucn that x(l + e)^^ -1 < x < x(l + e)^^ i.e., f x (x) = |"log( 1+e ) ~~|. 
It is easy to see that for x > \ we h ave f x (x) > 0- 

5.2.2 Technical Lemmas 

Let n res t be such that n > n rest > X- F° r anv 3 = [ a ] an d for every i G [n rest ], let Xi j be pairwise 
independent zero-one random variables with P(Xij = 1) = jj^j- Let Yj = Y^ie[n res t] 

Fact 5.6. 
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Proof. Let j = j x (n rest ); note that j > since n rest > %. We have E(Y jo ) = j^jfo- and, by pairwise 
independency of Xj ^: 

Var(Y j0 ) = n rest Var(X 1(hl ) = ~ JTTTfi^ " " * 

Let e' = |; we have, by Chebyshev's inequality: 



Also > \ Y h ~ (T^l < e'^o im P lies 



^<(1 + 0(^<(1 + 3 C )X, 



and 



Yj0 >[L e) (l+ e y°- [L e) (l+e) - (1+6)2- 



□ 



Fact 5.7. Le? Z = maxj e ^{j : (jj^p < < (1 + 3e)x} i/af feeut o«e such j exists and otherwise. 
Then 

P(Z > f x (n rest ) + 2) < 1-1. 

X 

Proo/ Let jo = fx( n rest) and consider fixed j' > jo + 2. We have, 

E (Y,) = , Ures \ ., , 

and by pairwise independency of Xs 

VariY,) = n rest Var(X jf A = , Hres \ ., (1 - - l — ) < - Hres \ , < 

Thus, 

X ^ X 



26 



11 11 

1 1 < (1 + e) 3 1 



X (1 + e^'-w^igjff " e 2 X (1 + e)i'-io-3' 
Clearly Z = f implies Y,y > h^ah ■ Thus, and by union bound over all f > j + 3, we have that 



P (z> , + 2)£ (l±i)! ± _L_ < + 0! (i±0 <i, 

A i'=io+3 v ; A A 



□ 



Corollary 5.8. Let Y- = Ylje[n re st] a hi-^i,j' where aij are arbitrary random zero-one variables. For 
Z' = max je[a] {j : j^jfei < Yj < (1 + 3e)x}, it is true that P{Z' > f x (n rest ) + 2) < ±. 

Proof. We have for any j 

P(Z' = j) < P(Y' > -^-) < P(Yj > X 



J " (l + e) 2j ~ v 3 " (1 + e) 
Thus, we can repeat the arguments from Fact l5.7l □ 

Fact 5.9. Let ( = (1 + e) 1 ^ - 1, then C>^- 
Proof. If C < then we have 

Q /nN Q Q j Q i 

(i + c) Q -i = Ec l (^)<Ew<E^ = Eii^- 

X — 1_ z — 1 z — 1 % — 1 

Thus, it must be the case that £ > ^y. □ 
5.2.3 The Algorithm and Proof of Lemma 1531 



Algorithm 5.10. <5(D,H,e,5) 

1. Pick random integer qfrom 0, . . . , Q — 1. 

2. For any j £ [a] generate pairwise-independent random hash functions Gj : [n] — > {0, 1} such 
that for any i G [n] P{Gj{i) = 1) = Tj+eF" 

J. parallel, apply Qj = £l(D, 7i, Gj, c, ^r)far all j = 0, . . . , a. 

4. For allO < j < a arcc? a/Z Z = — 1, . . . , b compute 3^,j that is a number of elements returned by 

in the range [(1 + C) 9 (l + e) 1 , (1 + C) 9 (l + e)^ 1 )- 

5. For every I G [b] compute Zi = maxj>o{j : nj^p < Vlj < (1 + 3e)x}; define Zi = //rao 
smc/j j exists. 

6. Return (1 + C) 9 £ Ie[6] (l + e)^+ 1 ^,. 
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Let T be a fixed function that defines vector V = T{D,TrC) with non-negative entries Vi such that 
Loo(V) = poly(n, m). Define q to be a uniform random integer from 0, . . . , Q — 1. For I = — 1, . . . , b, 
define a "layer" 5/ as a set of all v { s in the range [(1 + C) 9 (l + e) 1 , (1 + C) g (l + e) m )- Denote by si the 
number of elements in Si. For any I define a left boundary sub-layer Su e ft as a set of all ViS in the range 
[(1 + C) 9_1 (l + e )\ (1 + C) 9 (l + e )') an d sijeft to be its size. For any I define a right boundary sub-layer 
Si^ight as a set of all UjS in the range [(1 + C) 9 (l + e )\ (1 + C) 9+1 (l + e ) ) an d Slight to be its size. Let & 
be the set of all element in boundary (left or right) sublayers. It is straightforward to see the total weight of 
the elements in 6 is small, w.h.p.: 

Fact 5.11. P(E Uie e^>§ M) < 0.1. 

Proof. For a fixed v it let j = Qx + y, < y < Q be such that (1 + C) i_1 < «i < (1 + C) j - Then, 
P(ui G S) = P(<7 - 1 < y < q) = |. Thus, by Markov inequality, PQ^ee^ > §|^|) < 0.1. □ 

Proof of Lemma 15.51 We prove that Algorithm 15.101 satisfies the requirement of the lemma. Let B be the 
event that for all j, <Bj returns a c-cover of VGj (see Definition 15. lb . By parameters of 5Sj and by the union 
bound P(B) > 1 - ^ f or any fixed functions Gj. Let V be the event that E^eS v i < fr 1^1- B Y Fact BTTTI 
we have P(T>) > 0.9. In the remainder of this section we assume that B, V are true. The key observation is 
that if B is true then any Vi ^ 6 is not misclassified; i.e., if an approximation of Vi is returned, then it will 
belong to the same layer as V{. 

I. Upper Bound 

To prove the upper bound, we distinguish between large and small layers. A layer Si is large if s/ = 
$i + sij e ft + Slight > X> an d small otherwise. Consider a fixed /; if Si is a large layer, then Corollary 15.81 
is applicable as follows. Let JQj be the indicator of the event that Gj (i) = 1, and let , . . . , Uj. be the 
elements from Si U Sij e f t U Slight- Let atij be the indicator random variable that the approximation of 
will be counted byj^j. Since B is true, no elements outside of Si U Sij e f t U Slight can be counted. Thus, 
we can write 




t=i 
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and apply Corollary 15 . 8 1 with n res t = §i and an appropriate enumeration of Xs. Therefore, by Corollary 15. 81 
w.p. at least 1 — j^, 

Zi <f x (s l ) + 2. 

Consider the case that Z\ > 0. Then, by definition of Z\, we have 3^ < (1 + 3e)x, and thus by definition 
of f x : 

(1 + e) z <y ltZl < (1 + e )fxW+2 (1 + 3e ) x < (1 + e )% 

Also if Z\ = then 3^,0 — % assuming B. In this case we have (1 + e) Zi Y i z l < Sj. Thus, for any large 
layer, we have w.p. at least 1 — j^: 

{l + e) Zi y hZl < (1 + efk. 

Consider the case when Si is small. For the purposes of our analysis, we can add to 3^ j arbitrary elements 
USj.fi,... v x+ \ £ Si U Si t i e ft U Slight and define a^j = for all j and for all t > S/. Thus, the above 
bounds will be valid. Thus, we conclude that for every layer Si the approximation of its cardinality exceeds 
(1 + e) 6 si w.p. at most \. By union bound and by Fact 15.1 ll w.p. at least 1 — X: 

(1 + () q J> + efi + % Zl < (1 + C) 9 ^(1 + e) i+6 ^ < 
ie[6] ze[b] 

^ Vi(l + e) 7 +Yl < l + £ ) 7 + e ) 7 ( 1 + 20e )l y l' 

ie[n] ^es 

//. Lower Bound 

Now let us prove the lower bound. Assuming B, the only elements from Si that cannot be counted by 3* are 
those from Si^ight and S^+i.te/t- Let Sj = Si\(Si-ij e ftUSi + i yr i ght ) and let = sj- s/ +1) / e / t - si-i^ight 
to be its size. We change a definition of a large layer; is large if si > x, and imal/ otherwise. Consider 
an -^--significant layer Si. 
//./. forge layers 

First, let us assume that Si is large. Let u^ , . . . , Uj. be elements from S 1 /. Let Xjj = 1(3^(^=1 and let 
Ylj = St=i Xh,j> i- e -> * s tne num ber of elements among u^, . . . , Uj. that has not been zeroed by Gj. 
Consider an event A that nj^p < *i.f x (*t) — ^(1 ^ e )" ^ Fact 15 .61 we have 

P(.4) > 1 - I. 

X 
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Let R = Yly gSt ^j x ( s i)^ Vi b e ^ e tota l weight of all elements that do not belong to Si and contribute to 

|KG fx(Sl) |.Wehave 

E ^ = (i + ejfxW S ^ - (i + € )fx(s,)- 
Consider the event C that R < j^y^j- We have by Markov inequality that 

P(C) > 1 - 1. 

Below we prove that all elements from S 1 , will belong to c-cover returned by 0f (sj)- Recall that for any 
V{ G <§/ we have (l+£) 9 (l+e)' -1 < < (l+Q q (l+e) 1 . Thus, for every G S 1 , since S 1 , is -^--significant, 
C is true and by definition of f x : 

t* > (1 + C) 9 (l + et 1 > ±-±—\V\ > -L-±—R(l + e)f*<««> > 

X'{l + e)si x ,z {l + e)si 



2XX' 2 

Since „4 is true it follows that ^,f x (s ( ) < x(l + 3e). Thus, 



«i > (1 + C) 9 (l + e) 1 ' 1 > + C) 9 (l + e)^ 1 = 

^ E G fctt) (0(l + C) s (l + e) M > ^ E G fa(4l) (iK- 



Thus, we conclude that 



Vi ^4^\ VG fx(Si)\- 



But this bound and B imply that all Vi G <§/ with Gj = 1 will be found by Qf x (si) an d counted by 

y lMSl) . Thus 

tyfxCSi) > YiMh) - n + e )2- (5) 
Let = Si^e/t U Si-i tr ig ht U St+ijeft U fi^ht C 6. Let be the number of elements in £)j. Then since 
D is true, we have: 

(i + cr : (i + c)'-, > e * ^ 7^1 ^ ^ 7 E ^ 



7 E «*> 70,(1 +cr 1 (i+c)' 



e ^— ' e 
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Thus, 

0/ < e(l + e)§i < 2esi. 

Consider Y = Ylv-eSuO ^fx(^)W - Assuming B, only elements from S\ U D; can contribute to 3^,f x (s()> 
and thus 3^(5;) < F. Further, we have 

= (l+ C )fxW ^ (l +C )fxW ^ (1 + 2£)X - 

Also, by pairwise independence of we have Far(Y) < E(Y). Thus, by Chebyshev inequality: 

P(Y > (1 + 3e) X ) = P(t - E(Y) > (1 + 3e) X - E(Y)) < 

s ^ar(Y) (l + 2e) 1 
e*X eX X 



Therefore, 



P(y lMSl) >(l + 3e)x)<± (6) 



By © and ©, w.p. at least 1-^we have ^ < 3^,f x (s,) < (1 + 3e)x, in which case ^ > f x (s/) > 
and thus by definitions of Z\ and f x : 



(1 + e) z 'y, lZ| > (1 + e)^—^—, > 



si 



(1 + e) 2 " (1 + e) 2 ' 
11.2. small layers 

Similarly, if Si is small and Z\ > we have 
Otherwise if Zi = then, w.h.p. Yj o > % Indeed, for every «j € 5/ we have that 



Thus, Q wm return approximations of all elements from Si w.p. at least 1 — \\ and all approximations 

Zr 



will be counted towards 3^,o- Thus (1 + e) i y^z l > % 
11.3. putting it all together 

By union bound, for all I such that S\ is ^-significant layers, w.p. at least 1 — ^ we have 

(i + ef i yi, Zl > 



(1+6)2- 
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Note that 

M = £*+££*• 

Let C be the set of all I such that Si is ^--significant. Assuming D, we have for sufficiently large n: 

e 2 e 

£ v - + £ £ u * ^ 20 i — i y i + b -i y i ^ e i y i- 

log n y 

We have obtained that w.p. at least 1 — ^: 

(1 + 0" £(1 + ef' + % Zl > (1 + C) 9 £(1 + efi + % 2l > £(1 + C) 9 (l + ^Tj^ja > 



y, m (1-6) 

2^2^ i + e) 2- 1 + e)a l y l 



///. Conclusion 



We have shown that, w.p. at least (0.9)(1 - 4)(1 - ^) > 2/3, the output of Algorithm [5.10l is greater than 
or equal to fegp 1^1 an ^ smaller than or equal to (1 + e) 7 (l + 20e)| V|. By replacing e with an appropriate 
e' = 0(e), we obtain an e-approximation of |V|. 

IV. Memory bounds 

We apply o algorithms 0, thus the total memory required for these is o(/x(n, m, c, A-)). To generate 
pairwise-independent functions i7, we need O(alogn) memory bits. We also maintain ab counters y. 
In total, by Fact |5.9l we need 

O (- log(n)/x(n, m, - — ^ -, - — j — A + \ log 2 (ram) 

\e log (nm) log(nm) e z 

memory bits. □ 



6 Proving Lemmas 12.31 and \2A 



Lemma 12.31 There exists an algorithm 03fc_i that, given a data stream D and an access to hash functions 
Hi, . . . , Hk-\, in one pass obtains an e-approximation of \Tk-i(W (Mj n d, H\, . . . , iT&_i))| using memory 
O(^logilog^). 
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Proof. For j £ [n], define Cj to be independent random variables with Cauchy distribution. For i £ [n 
denote = nf=i Define 

Z = zt C A E "(wW) 
By the arguments from [31], a median of £l(-^z log |) independent Zs is an (e, (^-approximation of 



fc-i 



£ 

3=1 



E m (i,i) H (^ 



ie n ' 



\Tk-i(W(Mj n d, Hi, . . . , Hk-i))\. 



To construct Z in a single pass over L>, we follow the ideas from |[32l . Define /c + 1 random variables 
Joint, Margini, . . . , Mar gink to be initially equal to and to be updated as follows. Upon receiving 
a fc-tuple € [n] k ,i £ [n] k ~ 1 ,j £ [n], we put Joint s = Joint s + H(i)Cj. For s < k, we put 

Margin s = Mar gin s + -£f s (i s ). Finally we put Margin^ = Margin^ + Cj. We have 

Joint =^Cj f{i,3) U ^)- 

je[n] ie[n] k - 1 



Also, for s < k we have 



Finally 



Thus, 

k 

\{Margins = \ Y^ C ihV) 

s=l 



Margins = ^ / s (i s ).ff s (i s 



Margin k = ^ fk(j) c j- 

J'6H 



m 



E^' E ^W-fproditctC^j))- 



ieeh 



Thus, 



??? 



3=1 



s=l ; J \i£[nj' 

What remains is to analyze the memory bounds. Recall that we don't count the memory of TL, which will 
be analyzed separately. Thus, we need to bound a memory needed to compute Z s . To compute Z, our 
algorithm accesses n random variables Cj and computes a sketch that is a weighted sum of Cj. Indyk shows 
in OTl (see Sections 3.2 and 3.3), that if the coefficients of Cj jS are polynomially bounded integers, then 
it is possible to maintain such a sum with sufficient precision using 0(log ^) memory bits. By Fact 13.71 
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all entries of T s (W(Mj n d, TL)) are polynomially bounded integers; thus, we can repeat the arguments from 
OTI and the lemma follows. 

□ 

In the reminder of this paper, we assume that w = 0(kn). A zu-truncated Cauchy variable X is a 
modified Cauchy variable Y such X = — roly<_ ro + yi_ ro <y< ro + w\y >m . 

Definition 6.1. Let Cj t i,j G [t],i G [n] be independent random variables where C\ * are Cauchy and 
Cj,*,j > 1 ore w-truncated Cauchy variables. For every i G [n]* define C(i) = Jjf_i C; ; i r A product 
sketch oft-dimensional tensor M (with entries m;, i G [n]*j w 



C(M) = miC(i). 



ie hp 



Lemma 6.2. ft ij possible to generate in one pass a product sketch of a tensor T s > {W (Mi n( i, H\, . . . , H s )) 
for any < s' < s < k. 

Proof. Generate Cj^,j G [k — s'],i G [n] random variables as in Definition l6.il Consider + 1 variables 
Joint, Margim, ■ • • , Mar gink initially zero and updated as follows: compute 



and for j < s' 



and for j > s 



and for s' < j < s 



Joint = Joint + Hj(ij) Y\ ^',v + -! 

ie[s] je[fc-s'] 



Mar gin j = Mar gin j + Hj(ij); 
Mar gin j = Mar gin j + Cj-y^.; 
Marginj = Marginj + Hj(\j)Cj_ s >^.. 



At the end, we also compute Product = Y\j=i Marginj. We consider the quantity m Joint — Product 
written in the form ]Cie[n] fe - s ' C(i)Coef(i). O ur § oa l * s to compare Coe/(i) with the entries of the tensor 
T s r(W(Mi nc [, Hi, ... , H s )). Let i G [n] fc ~ s ' be fixed. For Joint, a coefficient that corresponds to C(i) is 
equal to: 

s' s 

e /(j,i)di^(j'))( n ^(w)). 

j 6 [ n ]«' Z=l Z=s'+1 
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For Product = f\ Mar gin j, a coefficient that corresponds to C(i) is equal to: 



je[n] 8 ' 



(n^ao)( n ^(w))n/'(^)ii/'(w+i) 

i=l i=s'+l Z=l Z=s' 



m k y, p P roduct{^m\{mi))( n ^(ij-.o)- 

ie[np' '=1 J=s'+1 

Thus, the coefficient of C(i) in m k Joint — Product is 

s' s 

e ^ai)(ii^^))( n ^i(w)). 



jeM' 



Z=s'+1 



On the other hand, consider T s i(W(Mj n( i, H\, . . . ,H S )). The coefficient of W(Mj n d, Hi, . . . , H s ) is m[ = 
nii H s l=1 fli(ij). Thus, the coefficient of 7>(VF(M /nd , #1, . . . , H s )) is for i e [n] fc ~ s ': 

s' s 

E TO a,i) = E "Wn^cwx n ^(i^)). 

Thus m k Joint — Product is the product sketch for T s i(W(Mj n ^, Hi, ... , H s )). It is important to note that 



the procedure above works for s' = as well. 



□ 



Fact 6.3. Let C±, . . . , C n be independent Cauchy variables and let ot\, . . . ,a n be arbitrary random vari- 
ables independent ofC\,..., C n . Then 

i 

Proof. By stability, we have ^ CjQj ~ C\a\, where C is a Cauchy variable. Thus, 

p(|c| H <i W) <i/^4 

□ 

Fact 6.4. Let {ai, . . . , a n } be non-negative real numbers and let X; L ,i G [n] be non-negative random vari- 
ables such that P(Xi < ai) < ^. Let X = J2ie[n] -^i- Then 
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Proof. Let Yi = a\ if Xi > a^, and Yi = otherwise. Then 



£7(y«) = ai P(Xi > oi) > Oi(l - J). 



Let Zi = Oi- Yi. Then > and £(Z f ) < ^. Let Y = J2i Y h z = Ei ^- Then b y Markov inequality, 

P(Z>^a,)<l 



Thus 



Thus 



P(X < (1 - £ a,) < P(Y < (1 - i-) J» < i. 

q q q' 

Putting q' = |, we obtain 

P(X<~X>)<~. 



□ 



Lemma 6.5. Let Y = Eie[n] fc 11^=1 Cj,i 3 m i where all C are Cauchy. For any M with entries m\ and for 
q > 3 fe we have 

- (2,)*' " q 

Proof. We prove the claim by induction on k. For k = 1 we have by Fact l6.3[ 

p«E<v,,i^)4 

2e[n] M M 

Consider > 1. For simplicity of presentation, put C\ = G\ \ and 

ieM k - 1 j=2 



Then 



Y = E 

2e[n] 

We have, by stability of Qs that, EzeM ^tXl ~ C Eze[n] 1^1 where C' is Cauchy distributed. Thus 

p (\ E <wi > = E mi > ^L) > 
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ie[n] ( 2c <) 2 q 

P(itfi2>i>sea, EW |> m ). 

I6[n] q q 

We have, by Fact |6.3l 

VI IZ^I 'I- q >- q 
Denote by Mj the Z-th hyperplane of M. By induction for each Z: 

Pm < ML, < 

' 1 " (2q)*- 1 ' " q 



Thus, by Fact [641 

1 \M\ ^ 2*3 k ~ 1 

By union bound, and since ^ + 2 * 3 ^ < the claim is correct. □ 

Corollary 6.6. Let Y = J2ie[n] k llj=i Cj^rm where all Cj t *,j > 1 are w-truncated Cauchy and all 
are Cauchy. For any M with entries m\ we have 

Vl 1 - 200 fc 3 fe2 - 50 

Proof. Consider an event that no Cs is equal to w. Repeating the arguments from 021 . the probability that 
this event does not occur is bounded by 

2kn 1 2kn 1 

~ 7_ 00 1 + x 2 ~ wtt ~ 100' 

Thus, and by Lemma [631 



P(m <J4 ) <j_ + j_ 

~ (2q) fc ~ 100 100 

for q = 100 * 3 fc . □ 
Lemma 6.7. Let M be a s-dimensional tensor for s < k and let Y be a product sketch of M. I.e., 

k 

ie[n] fc j=l 

where for all j £ [k],i £ [n] the random variables Cji are independent and C\ t * are Cauchy and Cj*, j > 1 
are truncated Cauchy. Then \ Y\ is a log k n-approximation of \M\ w.p. at least 0.07. 

37 



Proof. We consider s = k; the same arguments can be repeated for any s < k. Consider Yi = 
I Sie[n] fc -! Ilj=2 ^j,ij m (i,i)\-i an d l et Y' = Sze[n] ^i- Indyk ll3~TI shows that for any C with tu-truncated 
Cauchy distribution, it is true that S(|C|) < log (w 2 + l)/vr + O(l). Thus, and by the independency of all 
Cs, we have: 

k k 

E{Y l ) = E{\ Y, 11^-1^,1)1) < £ S d II^-iDK.oH 

i6[n] fc - 1 j=2 ie[n] fc -! J=2 

Thus, by Markov inequality: 

P(\Y'\ > 3001og fc-1 n|M|) < — . 
vi 1 & 100 

Since \Y\ < \Y'\ the upper bound follows. The lower bound follows from Corollary |6.6| and since for large 
enough n, log n > 200 * 3 k . □ 



Lemma 12.41 There exists an algorithm A Sl S2 (for any < S2 < si < fej ^fltf, g/ve« a <iafa 
stream D and an access to hash functions Hi, . . . , H S1 , in one pass obtains a log k n- approximation of 
\T S2 (W(M Ind , Hi, . . . , H Sl ))\ using memory 0(log (nm) log \). 

Proof. By Lemma l6\2l it is possible to construct a product sketch for \T S2 (W(Mj n d, Hi, . . . , H Sl ))\ 
in one pass. Also, by Lemma 16.71 the constructed product sketch is a log fc n-approximation of 
\T S2 (W(Mi n< i, Hi, ... , H n ))\ w.p. £7(1). Thus, taking a median 0(log 4) of independent product sketches 
results in a (log fc n, 5) -approximation. It remains to analyze the memory bounds. Repeating the arguments 
from ||32ll . each product sketch can be constructed with sufficient precision using 0(k log nm) memory bits. 
Also, the perfectly random variables can be replaced by pseudorandom variables and using the "sorting" 
argument from ll32ll (Section 3.2). We repeat the arguments of Indyk and McGregor k times (instead of two 
as in (321). □ 
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